naver lab europe
NAVER LABS Europe's Multilingual Speech Translation Systems for the IWSLT 2023 Low-Resource Track
Gow-Smith, Edward, Berard, Alexandre, Boito, Marcely Zanon, Calapodescu, Ioan
This paper presents NAVER LABS Europe's systems for Tamasheq-French and Quechua-Spanish speech translation in the IWSLT 2023 Low-Resource track. Our work attempts to maximize translation quality in low-resource settings using multilingual parameter-efficient solutions that leverage strong pre-trained models. Our primary submission for Tamasheq outperforms the previous state of the art by 7.5 BLEU points on the IWSLT 2022 test set, and achieves 23.6 BLEU on this year's test set, outperforming the second best participant by 7.7 points. For Quechua, we also rank first and achieve 17.7 BLEU, despite having only two hours of translation data. Finally, we show that our proposed multilingual architecture is also competitive for high-resource languages, outperforming the best unconstrained submission to the IWSLT 2021 Multilingual track, despite using much less training data and compute.
MARS: Motion-Augmented RGB Stream for Action Recognition - Naver Labs Europe
This blog presents our CVPR'19 paper on "MARS: Motion-Augmented RGB Stream for Action Recognition" done with the Thoth team at Inria. The code and trained models are available here. Action recognition in videos means you need to process both spatial and temporal information and, although CNNs have been pretty successful in modeling spatial information, their performance in modeling temporal information has been subpar. Current state-of-the-art techniques use 3D CNN based two stream architectures that are trained on a large dataset and where one stream processes appearance information using RGB frames while the other deals with motion information using optical flow. However, computing optical flows creates a latency for recognizing videos which obviously limits its use in real-time applications.
Proxy Virtual Worlds VKITTI 2 - Naver Labs Europe
The Virtual KITTI 2 dataset is an adaptation of the Virtual KITTI 1.3.1 dataset as described in the papers below. When using or referring to this dataset in your research, please cite the papers below and cite Naver as the originator of Virtual KITTI 2, an adaptation of Xerox's Virtual KITTI Dataset. Download We provide one .tar[.gz] archive per type of data as described below. Here is a list of the MD5 checksums for each archive.
Analyzing Information Flow in Transformers - Naver Labs Europe
Seminars at NAVER LABS Europe are open to the public but space is limited. Abstract: We will discuss what, how and why Transformers learn by analyzing 1. the mechanisms the model uses to encode different kinds of information; 2. how training objective defines information flow in a model. First, we will start with an in-depth analysis of multi-head attention. Using attribution methods, we will assess the importance of individual heads and will show that the most important heads play interpretable roles. Surprisingly, all the rest of the heads are redundant and, using our novel heads-pruning method, can be pruned with almost no loss in translation quality.
Predicting when Machine Learning Models Fail in Production - Naver Labs Europe
More crucially, this expensive maintenance process will continue forever as long as one would want a decent performance of their ML models that are deployed in production. Motivated by literature work from domain-shift and out of distribution detection,we propose a method that can predict the performance drop of a model when evaluated on a new target domain, without the need for any labelled examples from this target domain. Performing this estimation when done accurately and in real-time can have an important impact on the decision process of debugging and maintaining machine learning models in production. For instance, such insights can drive the decision to annotate more data for retraining or even adjusting models accordingly (e.g.
TAMGU: A new open source programming language to help create, annotate and augment corpora and data. - Naver Labs Europe
Speech recognition or machine translation have entered the lives of millions of people but, to make the machine learning (ML) algorithms behind them work better, it takes a lot of annotated and structured data. One way to get this data is by creating your own using specialized tools, an approach for which Christophe Ré coined the term'Data Programming'. We now compare corpora annotated by hand and by humans as'Gold Standard' with'Silver Standard' data created semi-automatically by artificial means. While Ré's group has produced its own set of tools to do this (called'Snorkel'), we decided to address the problem from the angle of programming. Having spent many years doing research on formal grammars, I watched these so-called symbolic methods gradually decline in favour of statistical approaches.